1 Pre-Requisites (Covered in Week 3-5)
- Neural Networks: The Theory behind Neural Networks
- Back Propagation: Working back from output nodes to input nodes.
- Activation Functions: Sigmoid, Hyperbolic, ReLU, Softmax
2 What is a Recurrence Neural Network?
A Recurrent Neural Network (RNN) is a basic form of neural network designed to deal with sequential data, such as time series or Natural Language Processing (NLP).
A RNN, like any other Neural Network, is made up of weights, biases, layers, and activation functions. A RNN also has an additional functionality which is the feedback loop. The key principle underlying RNNs is that they can keep a “memory” of prior inputs, allowing them to make predictions based on both the present and previous inputs. Thereby, the feedback loop is used to predict sequential input values over time.
2.1 Common types of RNN;
- One-to-One
- One-to-Many
- Many-to-One (Long Short Term Memory Network LSTM)
- Many-to-Many
2.2 Motivating Example: Predict Stock Prices.
Company A IPO’s (went public) 50 days ago, while Company B IPO’s (went public) 10 days ago. We want to utilize a Neural Network to forecast stock values for the different companies in such a way that:
- Each company’s input is independent. We want to use 50 days of data from Company A and 10 days of data from Company B.
- The Neural Network must be adaptable in terms of the amount of Sequential Data utilized to produce a forecast.
2.2.1 Following Neural Networks taken into consideration:
2.2.1.1 Single/Multiple Neural Networks:
Input value can only be fixed (Eg- 1 day, 5 days etc). This doesn’t make sense in terms of predicting the stock price of a company by only using limited data. This is because, every dataset is important in this case and past data from earlier days cant be ignored.
2.2.1.2 Recurrence Neural Network:
RNN can be used to predict companies stock prices based on their inputs independently. The feedback loop is used to predict sequential input values over time.
3 RNN Theory
Let us say that we have a sequential data \(X = \{ x_{1},x_{2},x_{3},\ldots,\ x_{n}\}\) and we want to use RNN to “learn” the pattern of the data to predict future data.
We can use a Timestep of the Data, of length \(T\), to predict the next \(K\) values in the dataset
\[use\ {\overrightarrow{x}}^{t} = \begin{bmatrix} x_{t} \\ x_{t + 1} \\ \ldots \\ x_{t + T-1} \\ \end{bmatrix}\ to\ predict\ {\widetilde{y}}^{t} = \begin{bmatrix} {\widetilde{x}}_{t + T } \\ {\widetilde{x}}_{t + T + 2} \\ \ldots \\ {\widetilde{x}}_{t + T + k -1 } \\ \end{bmatrix}\]
We can use Moving Averages to do this, however, doing so may cause results to not be accurate in predicting the next data in the series.
Therefore, in RNN, there is a “memory element” that is used in the prediction, such that the previous data in the sequence affects the future predictions. Let us call that the hidden vector \({\overrightarrow{h}}^{t}\).
This “memory element” should also store some information about the prediction \({\widetilde{y}}^{t}\). So we can compute \({\widetilde{y}}^{t}\) as some linear combination of \({\overrightarrow{h}}^{t}\).
3.1 Simple Flow of RNN
With this flow in mind, we
can construct the recursive equation for the hidden vector \({\overrightarrow{h}}^{t}\) and the
prediction \({\widetilde{y}}^{t}\) as
such:
\[{\overrightarrow{h}}^{t} = f\left( \overline{W}{\overrightarrow{h}}^{t - 1} + \overline{U}{\overrightarrow{x}}^{t} \right),\ \ \widetilde{y} = \overline{V}{\overrightarrow{h}}^{t}\ \]
Where we define the following Weight parameters and Vectors
Weight Parameters
\({\overline{W}}_{J \times J} \Rightarrow\) Weighing Parameter for the previous hidden vector \({\overrightarrow{h}}^{t - 1}\)
\({\overline{U}}_{J \times T} \Rightarrow\) Weighing Parameter for the timestep data \({\overrightarrow{x}}^{t}\)
\({\overline{V}}_{K \times J} \Rightarrow\) Weighing Parameter for the hidden vector \({\overrightarrow{h}}^{t}\) to compute the prediction \({\widetilde{y}}^{t}\)
\({\overrightarrow{x}}^{t}\) \(\epsilon\) \(R^{T} \Rightarrow\) Time step Data
\({\overrightarrow{h}}^{t}\epsilon R^{J} \Rightarrow\) Hidden Vector
\({\widetilde{y}}^{t}\epsilon R^{K}\Rightarrow\) Prediction
Dimensional Parameters
\(T \Rightarrow\) Length of Time step
\(K \Rightarrow\) Output Dimension
\(J \Rightarrow\) Dimension of hidden vector
3.1.1 Activation Function for RNN
Various Activation functions can be used for the hidden layer in RNNs, such as Sigmoid Function, Rectified Linear Unit (ReLU).
\[f(x) = \sigma(x) = \frac{1}{1 + exp( -
x)} \]
\[f(x) = \max(0, x)\]
For this example, we will be using the Sigmoid Function. So the formula for the hidden vector would be as such:
\[ \overrightarrow{h}^{t} = \sigma \left( \overline{W} \overrightarrow{h}^{t - 1} + \overline{U} \overrightarrow{x}^{t} \right) \] \[ \widetilde{y} = \overline{V}{\overrightarrow{h}}^{t} = \overline{V}\sigma\left( \overline{W}{\overrightarrow{h}}^{t - 1} + \overline{U}{\overrightarrow{x}}^{t} \right) \]
3.2 RNN Architecture
Thereby, the RNN Architecture is as such;
3.2.1 Back Propogation Through Time (BPTT)
The weight matrices \({\overline{V}}_{K \times J}, {\overline{U}}_{J \times T}\), and \(\overline{W}_{J \times J}\) are updated during training using Back propagation through time (BPTT), which is a variant of back propagation that is used for training recurrent networks. During BPTT, the gradients are calculated recursively through the time steps, allowing the network to learn the temporal dependencies in the data.
Based on the prediction values \({\widetilde{y}}^{t}\) and actual values \({\overrightarrow{y}}_{i} = \begin{bmatrix}x_{t + T } \\x_{t + T + 1} \\\ldots \\x_{t + T + k-1} \\\end{bmatrix}\), we can compute the Loss Function \(L\) as follows:
\[L = \frac{1}{2}\sum_{i = 1}^{n - T + 1}\left( {\widetilde{y}}_{i} - y_{i} \right)^{2}\]
In finding the partial derivatives, it is worth noting the derivative of the sigma function can be simplified as:
\[\sigma^{'}(x) = \sigma(x)\left( 1 - \sigma(x) \right) \\ \text{ } \\ {\overrightarrow{h'}}^{t} = {\overrightarrow{h}}^{t}\left( 1 - {\overrightarrow{h}}^{t} \right)\]
From the loss function, we can thus compute the following partial derivatives of the weighting parameters, \(\frac{\partial L}{\partial\overline{W}}\), \(\frac{\partial L}{\partial\overline{U}}\), \(\frac{\partial L}{\partial\overline{V}}\):
\[\frac{dL_{i}}{dV_{\alpha\beta}} = \frac{\partial L_{i}}{\partial{\widetilde{y}}_{j}}\frac{\partial{\widetilde{y}}_{j}}{\partial V_{\alpha\beta}} = \left( {\widetilde{y}}_{i} - y_{i} \right)h_{k}\]
\[ \frac{dL}{d\overline{V}} = \sum_{i = 1}^{n - T + 1}{\left( {\widetilde{y}}_{i} - y_{i} \right)h_{k}} \\ \text{ } \] \[ \frac{dL_{i}}{dU_{\alpha\beta}} = \frac{\partial L_{i}}{\partial{\widetilde{y}}_{j}}\frac{\partial{\widetilde{y}}_{j}}{\partial h_{k}}\frac{\partial h_{k}}{\partial U_{\alpha\beta}} \]
\[= \left( {\widetilde{y}}_{i} - y_{i} \right)\left( V_{ij} \right)\left( f^{'}\left( \overline{W}{\overrightarrow{h}}^{t - 1} + \overline{U}{\overrightarrow{x}}^{t} \right) \right)\left( {\overrightarrow{x}}^{i} \right)\]
\[= \left( {\widetilde{y}}_{i} - y_{i} \right)\left( V_{ij} \right)\left( {\overrightarrow{h}}^{t}\left( 1 - {\overrightarrow{h}}^{t} \right) \right)\left( {\overrightarrow{x}}^{i} \right)\]
\[\frac{dL}{d \overline{U}} = \sum_{i = 1}^{n - T + 1}{\left( {\widetilde{y}}_{i} - y_{i} \right)\left( V_{ij} \right)\left( {\overrightarrow{h}}^{t}\left( 1 - {\overrightarrow{h}}^{t} \right) \right)\left( {\overrightarrow{x}}^{i} \right)} \text{ } \\ \] \[ \frac{dL_{i}}{dW_{\alpha\beta}} = \frac{\partial L_{i}}{\partial{\widetilde{y}}_{j}}\frac{\partial{\widetilde{y}}_{j}}{\partial h_{k}}\frac{\partial h_{k}}{\partial W_{\alpha\beta}} \\ \]
\[ = \left( {\widetilde{y}}_{i} - y_{i} \right) \left( V_{ij} \right) \left( f^{'}\left( \overline{W}{\overrightarrow{h}}^{i - 1} + \overline{U}{\overrightarrow{x}}^{i} \right) \right) \left({\overrightarrow{h}}^{i - 1} \right) \]
\[\frac{dL}{dW} = \sum_{i = 1}^{n - T + 1}{\left( {\widetilde{y}}_{i} - y_{i} \right)\left( V_{ij} \right)\left( {\overrightarrow{h}}^{i}\left( 1 - {\overrightarrow{h}}^{i} \right) \right)\left( \ {\overrightarrow{h}}^{i - 1} \right)}\]
3.2.1.1 Gradient Descent
\[W_{\alpha\beta} \rightarrow W_{\alpha\beta} - \varepsilon\frac{dL}{dW_{\alpha\beta}}\]
\[U_{\alpha\beta} \rightarrow U_{\alpha\beta} - \varepsilon\frac{dL}{dU_{\alpha\beta}}\]
\[V_{\alpha\beta} \rightarrow V_{\alpha\beta} - \varepsilon\frac{dL}{dV_{\alpha\beta}}\]
3.3 Algorithm
Initialize Parameters
\(\overline{W} = 0, \overline{V} = 0, \overline{U} = 0\)
\(\overrightarrow{h}^{0} = 0\)
Iterate for N epochs
From t =1 to n-T+1
Get \({\overrightarrow{x}}^{t} = \begin{bmatrix} x_{t} \\ x_{t + 1} \\ \ldots \\ x_{t + T} \\ \end{bmatrix}\)
Find the hidden vector from previous hidden vector
\[{\overrightarrow{h}}^{t} = f\left( \overline{W}{\overrightarrow{h}}^{t - 1} + \overline{U}{\overrightarrow{x}}^{t} \right)\]
Compute predictions \[{\widetilde{y}}_{t} = \overline{V}{\overrightarrow{h}}^{t} = \overline{V}f\left( \overline{W}{\overrightarrow{h}}^{t - 1} + \overline{U}{\overrightarrow{x}}^{t} \right)\ \forall t = 1,\ 2,\ ...n - T + 1\]
Compute Partial Derivatives and update Parameters
\[\frac{dL}{d\overline{V}} = \sum_{i = 1}^{n - T + 1}{\left( {\widetilde{y}}_{i} - y_{i} \right)h_{k}}\]
\[\overline{V} \rightarrow \overline{V} - \varepsilon\frac{dL}{d\overline{V}}\]
\[\frac{dL}{d\overline{U}} = \sum_{t = 1}^{n - T + 1}{\left( {\widetilde{y}}_{t} - y_{t} \right)\left( V_{tj} \right)\left( {\overrightarrow{h}}^{t}\left( 1 - {\overrightarrow{h}}^{t} \right) \right)\left( {\overrightarrow{x}}^{t} \right)}\]
\[\overline{U} \rightarrow \overline{U} - \varepsilon\frac{dL}{d\overline{U}}\]
\[\frac{dL}{d\overline{W}} = \sum_{t = 1}^{n - T + 1}{\left( {\widetilde{y}}_{t} - y_{t} \right)\left( V_{tj} \right)\left( {\overrightarrow{h}}^{t}\left( 1 - {\overrightarrow{h}}^{t} \right) \right)\left( \ {\overrightarrow{h}}^{t - 1} \right)}\] \[\overline{W} \rightarrow \overline{W} - \varepsilon\frac{dL}{d\overline{W}}\]
4 Drawbacks for Basic RNN
Although RNNs are a powerful tool for processing sequential data, they are not without their challenges. One major issue with RNNs is the vanishing and exploding gradient problem.
When training an RNN, the weights of the network are updated through backpropagation, which involves computing gradients of the loss with respect to the weights at each timestep. The gradients are then used to update the weights in the direction that minimizes the loss. However, the gradients can become very small or very large as they are propagated through the network, which can cause the weight updates to be too small or too large, leading to slow convergence or divergence of the training process.
4.1 The Gradient Problem
4.1.1 The Vanishing Gradient Problem (Lesser than 1)
The vanishing gradient problem occurs when the partial derivative of the loss function with respect to the weights, \(\frac{\partial L}{\partial w_{ij}}\), becomes very small as it is propagated backwards through time, i.e., as \(t\) decreases. This can happen because the derivative of the activation function \(\frac{\partial z_t}{\partial w_{ij}}\) can become very small. When the gradient becomes too small, it can cause the weights to be updated very slowly or not at all, which can lead to poor convergence and slow learning.
4.1.2 The Exploding Gradient Problem (More than 1)
The exploding gradient problem occurs when the partial derivative of the loss function with respect to the weights, \(\frac{\partial L}{\partial w_{ij}}\), becomes very large as it is propagated backwards through time, i.e., as \(t\) decreases. This can happen because the derivative of the activation function \(\frac{\partial z_t}{\partial w_{ij}}\) can become very large. When the gradient becomes too large, it can cause the weights to be updated very aggressively, resulting in oscillations or divergence in the optimization process. This can make it difficult or impossible for the network to converge to a good solution.
4.2 How to Overcome this?- Long Short-Term Memory (LSTM)
LSTMs are a type of RNN that use memory cells and gates to control the flow of information, which helps to mitigate the vanishing gradient problem.
5 Understanding Long Short-Term Memory (LSTM)
Long Short-Term Memory (LSTM) is an RNN version that overcomes the vanishing gradient problem by incorporating memory cells that can retain data for an extended period of time. The input gate, forget gate, and output gate of the LSTM network govern the flow of information into and out of the memory cells.
The memory cell is controlled by three gates
- The Input Gate selects which data from the input should be saved in the memory cell
- The Forget Gate determines which data should be erased from the memory cell.
- The Output Gate governs the output of the LSTM cell based on the current input and the information stored in the memory cell.
5.1 LSTM Architecture
5.1.1 LSTM Functions
Information passes via an LSTM network via a set of linked nodes known as LSTM cells. Each LSTM cell contains numerous components that work together to govern information flow across the network. These elements are as follows:
5.1.1.1 Input Gate
The input gate determines which information from the input should be stored in the memory cell. It is controlled by a sigmoid activation function, which takes the current input and the previous output as inputs and outputs a value between 0 and 1. The sigmoid function is defined as: \[ \sigma(x) = \frac{1}{1 + e^{-x}} \]
5.1.1.2 Forget Gate
The forget gate decides which information should be removed from the memory cell. It is also controlled by a sigmoid activation function, which takes the current input and the previous output as inputs and outputs a value between 0 and 1. The forget gate function is defined as: \[ \sigma(x) = \frac{1}{1 + e^{-x}} \]
5.1.1.3 Output Gate
The output gate controls the output of the LSTM cell based on the current input and the stored information in the memory cell. It is controlled by a hyperbolic tangent activation function, which squashes the output between -1 and 1. The output gate function is defined as: \[ \tanh(x) = \frac{e^x - e^{-x}}{e^x + e^{-x}} \]
5.1.1.4 Memory Cell
The memory cell stores information for a long time, and its content is controlled by the input and forget gates. The memory cell is updated using a combination of the current input, the previous output, and the previous memory cell content. The memory cell update function is defined as: \[ C_t = f_t C_{t-1} + i_t \tilde{C}_t \] where
- \(C_t\) is the current memory cell content,
- \(f_t\) is the forget gate output,
- \(i_t\) is the input gate output, and
- \(\tilde{C}_t\) is the candidate cell content.
5.1.1.5 Candidate Cell
The candidate cell represents the new information that could be added to the memory cell. It is computed using the current input and the previous output, and is controlled by a hyperbolic tangent activation function. The candidate cell function is defined as: \[ \tilde{C}_t = \tanh(W_{cx} x_t + W_{ch} h_{t-1} + b_c) \] where
- \(W_{cx}\) and \(W_{ch}\) are weight matrices,
- \(x_t\) is the current input,
- \(h_{t-1}\) is the previous output, and
- \(b_c\) is the bias term.
5.1.2 Activation Functions
In addition to the activation functions used for the gates and the candidate cell, the LSTM network also uses activation functions for the output layer. The choice of activation function depends on the task at hand, but commonly used activation functions include:
5.1.2.1 Linear
\[ f(x) = Mx + c \]
5.1.2.2 ReLU
\[ \text{ReLU}(x) = \max(0, x) \]
5.1.2.3 Sigmoid
\[ \sigma(x) = \frac{1}{1 + e^{-x}}
\]
5.1.2.4 Hyperbolic Tangent
\[ \tanh(x) = \frac{e^x - e^{-x}}{e^x +
e^{-x}} \]
5.1.2.5 Softmax
\[ \sigma(z)_j = \frac{e^{z_j}}{\sum_{k=1}^K e^{z_k}} \quad \text{for } j = 1,\dots,K \]
## Warning in melt(df, id.vars = "x", variable.name = "Class", value.name =
## "Activation"): The melt generic in data.table has been passed a data.frame and
## will attempt to redirect to the relevant reshape2 method; please note that
## reshape2 is deprecated, and this redirection is now deprecated as well. To
## continue using melt methods from reshape2 while both libraries are attached,
## e.g. melt.list, you can prepend the namespace like reshape2::melt(df). In the
## next version, this warning will become an error.
5.2 LSTM Stages
LSTM is designed to remember or forget information selectively based on current input and context learned from previous inputs. During training, LSTM learns to assign weights to the input and previous cell state, deciding how much of each piece of information to keep in the current cell state using sigmoid and tanh activation functions. It can also decide how much of the previous cell state to forget based on the current input and context using another sigmoid activation function. This selective memory allows LSTMs to process sequential data and model complex temporal dependencies, making them useful in generating meaningful outputs.
5.2.1 Stage 1- The Percent to Remember
In Stage 1, the LSTM cell decides what percentage of the current input and what percentage of the previous context to remember for the current time step.
To achieve this, the LSTM cell uses three types of gates: input gate, forget gate, and output gate. These gates control the flow of information through the LSTM cell by using sigmoid and element-wise multiplication functions.
Let’s define some terms:
x_t: the current input at time stepth_{t-1}: the previous context (or output) at time stept-1i_t: the input gate activation vector at time steptf_t: the forget gate activation vector at time stepto_t: the output gate activation vector at time steptC_t: the cell state vector (or “memory”) at time steptW_i,W_f,W_o: weight matrices for input, forget, and output gates, respectivelyb_i,b_f,b_o: bias vectors for input, forget, and output gates, respectivelyσ: sigmoid activation function⊙: element-wise multiplication
Now, let’s define the equations for each of the three gates and the cell state update in Stage 1:
5.2.1.1 Input gate
The Input gate determines what percentage of the current input to remember
Equation: \[ i_t = \sigma(W_i \cdot [h_{t-1}, x_t] + b_i) \]
5.2.1.2 Forget gate
The Forget gate determines what percentage of the previous context to forget
Equation: \[ f_t = \sigma(W_f \cdot [h_{t-1}, x_t] + b_f) \]
5.2.1.3 Output gate
The Output gate determines what percentage of the current memory to output as context
Equation: \[ o_t = \sigma(W_o \cdot [h_{t-1}, x_t] + b_o) \]
5.2.1.4 Cell state update
The Cell state update combines the input gate activation vector, forget gate activation vector, and previous cell state to update the current cell state
Equation: \[ C_t = f_t \odot C_{t-1} + i_t \odot \tilde{C}_t \]
where
\[ \tilde{C}_t = \tanh(W_c \cdot [h_{t-1}, x_t] + b_c) \]
Here, \(W_c\) is the weight matrix for the cell state update, and \(b_c\) is the bias vector for the cell state update.
The input, forget, and output gates are determined using sigmoid activation functions in Stage 1 of the LSTM process to decide what fraction of the current input, past context, and current memory to employ. To update the current cell state, the cell state update is calculated using element-wise multiplication and the hyperbolic tangent function.
5.2.2 Stage 2- Update the Long Term Memory
In Stage 2 of the LSTM architecture, we update the long-term memory by combining the new information from the input with the previous memory.
First, we compute the new candidate values, \(\tilde{C_t}\), to be added to the memory:
\[\tilde{C_t} = \tanh(W_c \cdot [h_{t-1}, x_t] + b_c)\]
where \(W_c\) is the weight matrix for the cell state update, and \(b_c\) is the bias vector for the cell state update.
Next, we update the memory by selectively forgetting some of the previous memory and adding the new candidate values:
\[C_t = f_t \odot C_{t-1} + i_t \odot \tilde{C_t}\]
where \(\odot\) denotes element-wise multiplication, \(f_t\) is the forget gate activation vector, \(i_t\) is the input gate activation vector, and \(C_{t-1}\) is the previous cell state.
Finally, we compute the output of the LSTM cell at this time step, \(h_t\), by passing the updated memory through the output gate:
\[h_t = o_t \odot \tanh(C_t)\]
where \(o_t\) is the output gate activation vector.
In summary, in Stage 2 of the LSTM architecture, we update the long-term memory by selectively forgetting some of the previous memory and adding new information from the input, and then compute the output of the LSTM cell based on the updated memory.
5.2.3 Stage 3- Update the Short Term Memory
In Stage 3 of the LSTM process, the updated cell state is used to calculate the output at the current time step, which becomes the new context for the next time step.
\[ h_t = o_t \odot \tanh(C_t) \] where
- \(h_t\): the current output (or context) at time step t
- \(o_t\): the output gate activation vector at time step t
- \(C_t\): the updated cell state vector (or “memory”) at time step t
- \(tanh\): hyperbolic tangent activation function
Here, the output gate activation vector \(o_t\) determines what percentage of the current memory to output as context, and the hyperbolic tangent function is applied to the updated cell state \(C_t\) to obtain the output at the current time step.
Using the LSTM process in this way allows for long-term dependencies to be captured and stored in the memory, while also selectively forgetting irrelevant information and updating the memory based on new inputs. This makes LSTMs particularly useful for tasks such as speech recognition, language translation, and sentiment analysis.
6 Stock Prediction Model using RNN and LSTM
6.1 Read csv File
- Dataset: Google Stock Data from Yahoo Finance (Last 5 Years)
- We will use the first 4 years of data to train the model and the last 1 year to test the model.
df <- read.csv("GoogleStockData.csv", stringsAsFactors = FALSE)
head(df)## Date Open High Low Close Volume
## 1 03/01/2023 90.16 91.20 89.85 90.51 26,323,881
## 2 02/28/2023 89.54 91.45 89.52 90.30 30,546,910
## 3 02/27/2023 90.09 90.45 89.61 90.10 22,724,260
## 4 02/24/2023 89.63 90.13 88.86 89.35 31,295,619
## 5 02/23/2023 92.13 92.13 90.01 91.07 32,423,721
## 6 02/22/2023 91.93 92.36 90.87 91.80 29,891,141
6.2 Dataset Modifications
6.2.1 Reformat the Date Column
- Change Date to desired format (from “mm/dd/yyyy” to “yyyy-mm-dd”)
- Arrange date in ascending order
df$Date <- as.Date(df$Date, format = "%m/%d/%Y")
df$Date <- format(df$Date, "%Y-%m-%d")
df <- arrange(df, Date)
head(df)## Date Open High Low Close Volume
## 1 2018-03-01 55.39 55.51 53.35 53.48 50,318,200
## 2 2018-03-02 52.65 54.10 52.41 53.95 45,431,020
## 3 2018-03-05 53.76 54.86 53.45 54.55 24,043,480
## 4 2018-03-06 54.96 55.09 54.49 54.75 30,655,660
## 5 2018-03-07 54.46 55.61 54.27 55.48 25,850,740
## 6 2018-03-08 55.77 56.38 55.64 56.30 27,102,500
6.2.2 Find Google’s Max Stock value. We will use this value in our analysis later.
max_close <- max(df$Close)
cat("The Max Value of the Google Stock Price is", max_close)## The Max Value of the Google Stock Price is 150.71
6.2.3 Normalise the dataset
- Select the Date and Close Column
- Convert the Date column into Date format
- Normalize the Close column
df <- df %>%
select(Date, Close)
df$Date <- ymd(df$Date)
df$Close <- scales::rescale(df$Close)6.3 Train and Test the dataset
set.seed(123)
train_size <- floor(0.7 * nrow(df))
train_data <- df[1:train_size, ]
test_data <- df[(train_size + 1):nrow(df), ]6.4 Train the RNN and LSTM Model
- Define number of time steps in the model
- Define the LSTM Model, add LSTM hidden layer (50 units) and output layer (1 unit)
- Complile and Fit the Model
num_steps <- 200
RNN_model <- keras_model_sequential() %>%
layer_simple_rnn(units = 200, input_shape = list(num_steps, 1)) %>%
layer_dense(units = 1)
RNN_model %>% summary()## Model: "sequential"
## ________________________________________________________________________________
## Layer (type) Output Shape Param #
## ================================================================================
## simple_rnn (SimpleRNN) (None, 200) 40400
## dense (Dense) (None, 1) 201
## ================================================================================
## Total params: 40,601
## Trainable params: 40,601
## Non-trainable params: 0
## ________________________________________________________________________________
RNN_model %>% compile(
loss = "mean_squared_error",
optimizer = optimizer_adam(),
metrics = list("mean_absolute_error")
)num_steps <- 200
LSTM_model <- keras_model_sequential() %>%
layer_lstm(units = 200, input_shape = c(num_steps, 1)) %>%
layer_dense(units = 1)
LSTM_model %>% summary## Model: "sequential_1"
## ________________________________________________________________________________
## Layer (type) Output Shape Param #
## ================================================================================
## lstm (LSTM) (None, 200) 161600
## dense_1 (Dense) (None, 1) 201
## ================================================================================
## Total params: 161,801
## Trainable params: 161,801
## Non-trainable params: 0
## ________________________________________________________________________________
LSTM_model %>% compile(
loss = "mean_squared_error",
optimizer = optimizer_adam()
)6.5 Train the Model using the Training Datasets.
# Prepare the training data
train_x <- array(0, dim = c(nrow(train_data) - num_steps, num_steps, 1))
train_y <- array(0, dim = c(nrow(train_data) - num_steps, 1))
for (i in 1:(nrow(train_data) - num_steps)) {
train_x[i,,] <- train_data[i:(i + num_steps - 1), "Close"]
train_y[i,] <- train_data[i + num_steps, "Close"]
}# Train the RNN model
RNN_history <- RNN_model %>% fit(
train_x, train_y,
epochs = 100,
batch_size = 32
)
# Plot the training history
plot(RNN_history)# Train the LSTM model
LSTM_history <- LSTM_model %>% fit(
train_x, train_y,
epochs = 100,
batch_size = 32
)
plot(LSTM_history)6.6 Test the Model and calculate the Accuracy
# Prepare the test data
test_x <- array(0, dim = c(nrow(test_data) - num_steps, num_steps, 1))
test_y <- array(0, dim = c(nrow(test_data) - num_steps, 1))
for (i in 1:(nrow(test_data) - num_steps)) {
test_x[i,,] <- test_data[i:(i + num_steps - 1), "Close"]
test_y[i,] <- test_data[i + num_steps, "Close"]
}
# Use the trained RNN model to make predictions on the test data
RNN_predictions <- RNN_model %>% predict(test_x)
# Accuracy of the RNN model
RNN_accuracy <- 1 - mean(abs(RNN_predictions - test_y)/test_y)
RNN_accuracy_percent <- round(RNN_accuracy * 100, 2)
cat("Accuracy of the RNN Model: ", RNN_accuracy_percent, "%\n")## Accuracy of the RNN Model: 94.16 %
# Use the trained LSTM model to make predictions on the test data
LSTM_predictions <- LSTM_model %>% predict(test_x)
# Accuracy of the LSTM model
LSTM_accuracy <- 1 - mean(abs(LSTM_predictions - test_y)/test_y)
LSTM_accuracy_percent <- round(LSTM_accuracy * 100, 2)
cat("Accuracy of the LSTM Model: ", LSTM_accuracy_percent, "%\n")## Accuracy of the LSTM Model: 95.53 %